An Improvement of AdaBoost to Avoid Overfitting
نویسندگان
چکیده
Recent work has shown that combining multiple versions of weak classiiers such as decision trees or neural networks results in reduced test set error. To study this in greater detail, we analyze the asymptotic behavior of AdaBoost. The theoretical analysis establishes the relation between the distribution of margins of the training examples and the generated voting classiication rule. The paper shows asymptotic experimental results with RBF networks for the binary classi-cation case underlining the theoretical ndings. Our experiments show that AdaBoost does overrt, indeed. In order to avoid this and to get better generalization performance, we propose a regularized improved version of AdaBoost, which is called AdaBoostreg. We show the usefulness of this improvement in numerical simulations.
منابع مشابه
A Fast Scheme for Feature Subset Selection to Avoid Overfitting in AdaBoost
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We show that with the introduction of a scoring function and the random selection of training data it is possible to create a smaller set of feature vectors. The selection of th...
متن کاملUsing Validation Sets to Avoid Overfitting in AdaBoost
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. Half of the training set is removed ...
متن کاملUsing Validation to Avoid Overfitting in Boosting Using Validation to Avoid Overfitting in Boosting
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because it focuses on misclassified examples, which may be noisy. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. The training set is partitioned in...
متن کاملAvoiding Boosting Overfitting by Removing Confusing Samples
Boosting methods are known to exhibit noticeable overfitting on some datasets, while being immune to overfitting on other ones. In this paper we show that standard boosting algorithms are not appropriate in case of overlapping classes. This inadequateness is likely to be the major source of boosting overfitting while working with real world data. To verify our conclusion we use the fact that an...
متن کاملAn Imprecise Boosting-like Approach to Classification
A new approach for ensemble construction based on restricting a set of weights of examples in training data to avoid overfitting is proposed in the paper. The algorithm called EPIBoost (Extreme Points Imprecise Boost) applies imprecise statistical models to restrict the set of weights. The updating of the weights within the restricted set is replaced by updating the weights in the linear combin...
متن کامل